Back

JMIR Public Health and Surveillance

JMIR Publications Inc.

Preprints posted in the last 90 days, ranked by how well they match JMIR Public Health and Surveillance's content profile, based on 45 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit.

1
AI-Driven Feature Selection Using Only Survey Variable Descriptions: Large Language Models Identify Adolescent Vaping Predictors

Zhang, K.; Zhao, Z.; Hu, Y.; Le, T.

2026-03-09 health informatics 10.64898/2026.03.06.26347816 medRxiv
Top 0.1%
14.7%
Show abstract

ObjectiveTo evaluate the effectiveness of various Large Language Models (LLMs) in identifying reliable predictors of Electronic Nicotine Delivery Systems (ENDS) initiation among adolescents, using solely large-scale survey variable descriptions. MethodsA cohort of 7,943 tobacco-naive adolescents aged 12-16 years from the Population Assessment of Tobacco and Health (PATH) Study was analyzed to predict ENDS use at wave 5. Four instruction-tuned LLMs - GPT-4o, LLaMA 3.1-70B, Qwen 2.5-72B-Instruct, and DeepSeek-V3 - were systematically evaluated for text-based feature selection using only variable descriptions from wave 4.5. Selected features were used to train LightGBM classifiers, with model performance compared to a baseline. ResultsOur findings reveal notable consistency among the four instruction-tuned LLMs, with substantial overlap in the top predictors each model identified. These selected variables spanned critical domains such as peer and household influence, risk perception, and exposure to tobacco-related cues. LightGBM classifiers trained on PATH wave 4.5-5 data using features selected by the LLMs demonstrated strong predictive performance. Notably, Qwen 2.5-72B-Instruct achieved an AUC of 0.791 with 30 predictors, surpassing the baseline AUC of 0.768. DiscussionThe substantial overlap among the top predictors identified by different LLMs suggests a shared reasoning process, despite variations in model architecture and training. LightGBM classifiers trained on these LLM-selected features achieved performance comparable to, or exceeding, models trained on the full set of survey variables, underscoring the high quality of features selected solely from textual descriptions. Moreover, these findings are consistent with previous tobacco regulatory research, further validating the effectiveness of LLM-driven feature selection. ConclusionInstruction-tuned large language models can effectively perform text-based feature selection using survey variable descriptions alone, without accessing raw survey data. This scalable, interpretable, and privacy-preserving framework holds promise for behavioral health research and tobacco use surveillance.

2
Improvement in Albuminuria Screening Associated with EHR Decision Support Change

Zafar, W.; Tavares, S.; Hu, Y.; Brubaker, L.; Green, J.; Mehta, S.; Grams, M. E.; Chang, A. R.

2026-02-14 health informatics 10.64898/2026.02.09.26345709 medRxiv
Top 0.1%
10.1%
Show abstract

BackgroundAlbuminuria is associated with increased risk of cardiovascular disease (CVD), heart failure, and progression of chronic kidney disease (CKD). Early detection of albuminuria, done through spot urine albumin creatinine ratio (UACR) testing, enables more accurate risk stratification and timely use of preventative therapies. It remains unacceptably low in the hypertension population. MethodsWe evaluated two EHR-embedded clinical decision support (CDS) strategies at Geisinger Health System in order to increase UACR testing in individuals with hypertension: an OurPractice Advisory (OPA) from Jan 2022 to Aug 2022; and a Health Maintenance Topic (HMT) in the Care Gaps section of Storyboard from Aug 2022 that continues to date. We evaluated UACR rates from 2020 to 2023 in Geisinger primary care and compared to a control group of healthcare systems in the Optum Labs Data Warehouse [OLDW]. Patients were excluded if they had UACR testing in the preceding 3 years, had diabetes or CKD, or were receiving palliative/hospice care. ResultsWe included 58,876 individuals in Geisinger (mean age 59.4 years, 49.6% female) and 1,427,754 in OLDW (61.0 years, 49% female). UACR testing in Geisinger (2.97% in 2020; 2.8% in 2021; 9.7% in 2022; 17.5% in 2023) showed significant increase compared to the control health systems (2.08%, 2.26%, 3.35% and 3.40% respectively). Results were consistent after adjusting for age, sex and race. ConclusionOPA increased UACR testing [~]3-fold whereas the HMT was associated with further improvements ([~]6-fold vs. baseline) among those with hypertension, suggesting an important role for CDS design in closing care gaps.

3
Data-Driven Hybrid Model of SARIMA-CNNAR For Tuberculosis Incidence Time Series Analysis in Nepal

Singh, D. B.; Dawadi, P. R.; Dangi, Y.

2026-02-24 health informatics 10.64898/2026.02.22.26346853 medRxiv
Top 0.1%
8.5%
Show abstract

BackgroundTuberculosis (TB) remains a major public health challenge in Nepal, with incidence rates substantially higher than global estimates. Accurate forecasting of TB incidence is essential for early warning systems, resource allocation, and targeted interventions. This study aimed to develop and validate a hybrid Seasonal Autoregressive Integrated Moving Average (SARIMA) and Convolutional Neural Network Auto-Regressive (CNNAR) model for TB incidence forecasting in Nepal. MethodsMonthly TB incidence data (January 2015 to December 2024) were obtained from the National Tuberculosis Control Center (NTCC), Nepal. A hybrid SARIMA-CNNAR model was developed, where SARIMA modeled linear seasonal trends and CNNAR captured nonlinear patterns in the residuals. Hyperparameters were optimized using grid search with 5-fold cross-validation. Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and R2 on the 2024 test set. Structural break analysis and sensitivity analysis assessed model robustness. The hybrid model was compared against standalone SARIMA, CNNAR, and three state-of-the-art benchmarks: Long Short-Term Memory (LSTM), Facebook Prophet, and XGBoost. ResultsTB incidence in Nepal increased from a monthly average of 2,048 cases in 2015 to 3,447 in 2024 (68.4% increase). The hybrid SARIMA-CNNAR model demonstrated strong performance with test set metrics of MAE=248.35, RMSE=294.31, MAPE=7.2%, and R2=0.79. Comparative performance: CNNAR (MAE=251.08, RMSE=336.55, MAPE=7.7%, R2=0.73); LSTM (MAE=267.91, RMSE=324.55, MAPE=7.5%, R2=0.75); XGBoost (MAE=314.74, RMSE=373.99, MAPE=8.5%, R2=0.66); Prophet (MAE=371.15, RMSE=478.40, MAPE=10.4%, R2=0.45); SARIMA (MAE=401.11, RMSE=503.93, MAPE=10.99%, R2=0.39). All models captured seasonal peaks in March-May and July-August, with forecasts for 2025 indicating continued seasonal patterns. Sensitivity analysis confirmed robustness with <5% metric variation across parameter configurations. ConclusionsThis first validated hybrid model for TB prediction in Nepal demonstrates high forecasting accuracy by integrating linear seasonal modeling with nonlinear pattern detection. The approach offers a robust tool for evidence-based public health planning in resource-limited settings and it is suitable for integration into national surveillance systems. Author SummaryTuberculosis remains a major public health challenge in Nepal, with cases increasing substantially over the past-decade. In this study, we developed a computer model that combines two different forecasting ap proaches: one that captures regular seasonal patterns and another that learns complex trends from data to predict monthly TB cases. Using ten years of national surveillance data, our hybrid model achieved high accuracy in forecasting TB incidence, outperforming standard approaches including SARIMA, PROPHET, CNNAR, LSTM neural networks, and XGBoost. The model successfully predicted seasonal peaks in March-May and July-August, with forecasts for 2025 suggesting continued high case numbers. These predictions can help Nepals health authorities prepare by pre-positioning diagnostic supplies, scheduling additional staffs during peak months, and targeting awareness campaigns. The modeling approach is desig ned to be adaptable for other diseases and countries with similar health data.

4
Generation of Synthetic Data in Health Surveys Using Large Language Models

Villarreal-Zegarra, D.; Bellido-Boza, L.

2026-01-30 health informatics 10.64898/2026.01.27.26345015 medRxiv
Top 0.1%
8.2%
Show abstract

BackgroundGenerating synthetic data using artificial intelligence, such as large language models (LLMs), is a useful strategy in public health because it can reduce time and costs, expand access to data, and facilitate information sharing without compromising confidentiality. ObjectiveTo evaluate the consistency and psychometric plausibility of synthetic data generated by an LLM to simulate the responses of survey participants (user personas) in a national health survey in Peru. MethodsWe conducted a cross-sectional study based on the National Health Satisfaction Survey (ENSUSALUD 2016) of ambulatory health service users. We used the GPT-OSS-20B model to generate synthetic responses in Spanish, conditioned on narrative profiles derived from sociodemographic and clinical variables. We evaluated consistency between responses and profile characteristics (sex, age, and comorbidities) using performance metrics (accuracy, precision, recall, F1 score, and AUC). We compared distributions between real and synthetic data using t-tests and chi-square tests. For latent variables, we conducted confirmatory factor analyses of the PHQ-9, PHQ-8, and GAD-7 (WLSMV; polychoric matrices) and estimated internal consistency ( and {omega}). We examined normality (Jarque-Bera test) and stability through correlations between real measures (PHQ-2 and EQ-5D) and synthetic measures (PHQ-2, PHQ-8, PHQ-9, GAD-2, and GAD-7). ResultsThe model showed strong concordance with the profile for sex, age, and chronic disease status, with metrics close to 1 for most variables; overall consistency was high in the vast majority of cases. The synthetic PHQ-9, PHQ-8, and GAD-7 instruments showed optimal factor fit and high internal consistency. Synthetic measures were positively and significantly correlated with the real PHQ-2 and negatively correlated with EQ-5D, with moderate to high correlations, particularly for PHQ-8/PHQ-9 and GAD-7. ConclusionsAn LLM can generate plausible synthetic data for health surveys when its output is conditioned on user personas, preserving high coherence with demographic and clinical characteristics and maintaining adequate psychometric properties in depression and anxiety scales. However, relevant deviations were identified (e.g., overestimation of obesity, unexpected distributions in some variables, and missing values in a sensitive item), which supports the need for rigorous validation and bias control before using these data for inferential purposes or public policy.

5
Estimating the mpox vaccine uptake among MSM and modelling the potential of future vaccination campaigns in the EU/EEA

Prasse, B.; Hansson, D.; Aphami, L.; Jonas, K. J.; Borrel Pique, J.; Andrianou, X.; Pharris, A.; Plachouras, D.; Schmidt, A. J.; Nerlander, L.

2026-04-18 public and global health 10.64898/2026.04.16.26350851 medRxiv
Top 0.1%
7.2%
Show abstract

In October 2025, mpox virus clade I infections have been detected among men who have sex with men (MSM) in the EU/EEA, suggesting local transmission in MSM sexual networks. Given the large outbreak of mpox among MSM in 2022 and the uncertain transmission parameters of clade I in the European context, clade I poses a public health concern to the EU/EEA. This work assesses the potential effect of increasing the mpox vaccine uptake among MSM via two contributions. First, building on the European MSM and Trans Persons Internet Survey 2024, we estimate the mpox vaccine uptake among MSM as well as the proportion who are unvaccinated but willing to get vaccinated for 28 countries in the EU/EEA. Specifically, we fit Bayesian mixed-effects models for the vaccine and recovery status of an individual depending on their number of sexual partners and country. Second, we develop a susceptible-infectious-recovered model on a sexual contact network to estimate the reduction of the reproduction number if vaccines are provided to MSM who are willing to get vaccinated. Our results suggest a substantial willingness for mpox vaccination among MSM if mpox cases increase and a large reduction of the effective reproduction number if this willingness is met. These findings highlight a large potential of increasing mpox vaccine uptake among MSM and preventing future mpox outbreaks in the EU/EEA.

6
A unified modeling platform for informing cervical cancer prevention policy decisions in 132 low- and middle-income countries

Man, I.; Macacu, A.; Eynard, M.; Adhikari, I.; Gini, A.; Georges, D.; Baussano, I.

2026-03-20 public and global health 10.64898/2026.03.18.26348700 medRxiv
Top 0.1%
7.1%
Show abstract

Background: Public health decision modelling tools designed to inform cervical cancer prevention policies in low- and middle-income countries (LMICs) are useful but scarce. Important challenges herein are the often missing or inconsistently collected cervical cancer epidemiological data, and the lack of a systematic approach to deal with such data limitations. Methodology/Principal Findings: We developed a unified modelling platform and workflow to enable cervical cancer modelling in 132 LMICs based on the previously developed footprinting approach, through the following steps: 1) With sexual behavior data from the Demographic Health Surveys (DHS), which were available for a large number of LMICs (70/132), we identified clusters of countries which represent distinct patterns of human papillomavirus (HPV) transmission. The 7 resulting clusters correspond to a gradient of HPV prevalence and cervical cancer risk and exhibit clear geographical separation. 2) The remaining LMICs were classified into the identified clusters based on geographical proximity so that each LMIC was grouped to a cluster. Goodness of classification was validated using available epidemiological data. 3) We then calibrated the HPV transmission and cervical cancer progression models of the IARC/WHO METHIS platform to the 132 LMICS, first by cluster then by country, using the available data on sexual behavior (from DHS), HPV prevalence (from literature search), and cervical cancer incidence (from GLOBOCAN). Conclusions/Significance: A unified workflow and platform designed by IARC/WHO for public health decision modelling of cervical cancer prevention in 132 LMICs is now available. It is ready to be used to support global and local stakeholders to coordinate, design, and implement impactful and efficient prevention policies and will help to accelerate cervical cancer elimination.

7
The Influence of Polypharmacy on Type 2 Diabetes Adverse Cardiovascular Outcomes in a Rural Cohort

Li, J. W.; Crew, L. A.; Cox, T. M.; Canine, B. F.

2026-04-03 endocrinology 10.64898/2026.04.02.26350053 medRxiv
Top 0.1%
7.1%
Show abstract

Objective: In this study, we utilized a large-scale clinical database to evaluate the relationship between polypharmacy and adverse outcomes among type 2 diabetes patients in rural Montana to inform strategies that improve adherence, reduce preventable complications, and promote equitable diabetes care in underserved regions. Research Design and Methods: 591 patients from the Big Sky Care Connect Database (BSCC) with type 2 diabetes and medication history were stratified into 3 cohorts based on prescribed number of medications: (1-4 medications, non-polypharmic), (5-9 medications, polypharmic), and ([&ge;]10 medications, hyperpolypharmic). Each cohort was examined for Major Adverse Cardiovascular Events (MACE) and Diabetes Complication Severity Index (DCSI). Descriptive statistics, multivariate logistic regressions, linear regression, and Poisson regression analyses were performed. Results: Medication count was associated with male gender ({beta} = -2.1341, p < 0.001). Both medication count (IRR 1.06 per additional medication, p < 0.001) and age (IRR 1.03 per year, p < 0.001) were significant predictors of MACE. Neuropathy and nephropathy prevalence was statistically significant (p < 0.001) across patient cohorts and increased with medication count.

8
Persistent Proxy Discrimination in HIV Testing Prediction Models: A National Fairness Audit of 386,775 US Adults

Farquhar, H.

2026-03-16 health informatics 10.64898/2026.01.27.26344936 medRxiv
Top 0.1%
7.0%
Show abstract

BackgroundIn clinical contexts where disease burden differs across demographic groups, enforcing demographic parity -- equal prediction rates regardless of group -- may reduce screening for the populations that need it most. We demonstrate this using HIV testing prediction as a case study. MethodsUsing the Behavioral Risk Factor Surveillance System (BRFSS) 2024 dataset (N=386,775), we trained four classifiers to predict HIV testing uptake and evaluated disparities using demographic parity difference (DPD), equalized odds difference (EOD), and calibration across eight racial/ethnic groups. We applied threshold optimization and exponentiated gradient mitigation and quantified their impact on high-burden populations, including intersectional effects across race and sex. ResultsBaseline selection rates ranged from 12.1% (Asian) to 66.0% (Black), mirroring differential HIV burden (DPD 0.519-0.634). Race-blind models retained 70% of baseline disparity through correlated social determinants. Enforcing demographic parity reduced Black true positive rates from 78.2% to 30.0% (61.6% relative decrease), causing 1,610 additional missed individuals. Race-only optimization worsened sex-based disparity by 71%; multi-objective optimization reduced intersectional DPD from 0.609 to 0.076 but at the same cost to high-burden groups. Exponentiated gradient AUC fell from 0.671 to 0.592 (11.8% relative decrease). Survey-weighted sensitivity analysis confirmed unweighted estimates underestimated disparities. ConclusionsDemographic parity is an inappropriate fairness criterion in differential-burden clinical contexts because it reduces screening access for high-risk populations. Fairness audits in healthcare should use need-appropriate metrics (equalized odds, calibration) rather than defaulting to demographic parity, and metric selection should involve clinician and community stakeholder deliberation.

9
Leveraging large language models to address common vaccination myths and misconceptions

Reis, F.; Bayer, L. J.; Malerczyk, C.; Lenz, C.; von Eiff, C.

2026-03-02 health informatics 10.64898/2026.02.27.26347254 medRxiv
Top 0.1%
6.8%
Show abstract

Large language models (LLMs) are increasingly used by the public to seek health information, yet their reliability in addressing common vaccine myths remains unclear. We conducted an exploratory multi-vendor evaluation of three LLMs (GPT-5, Gemini 2.5 Flash, Claude Sonnet 4) using officially curated vaccination myths from Germanys public health institution and two realistic user framings as prompts: a curious skeptic and a convinced believer. All model responses were independently evaluated by two blinded medical experts for misconception addressal (binary), scientific accuracy, and communication clarity (5-point Likert scales). Additionally, blinded marketing experts ranked models for lay communication clarity, and Flesch-Kincaid Reading Ease scores were computed for all outputs. Across all myths, prompts, and models (11 x 2 x 3 = 66 rating items), medical raters found 100% successful refutation of misinformation. Scientific accuracy and clarity ratings were high and tightly clustered (median 4.0-4.5), with no combined score below 3 and substantial inter-rater agreement. Marketing experts independently ranked Gemini 2.5 Flash and GPT-5 highest for lay clarity, with Claude Sonnet 4 consistently less favored. Readability analysis revealed generally low accessibility, particularly for the convinced believer framing and for Claude Sonnet 4 outputs. Our findings suggest that current general-purpose LLMs can deliver accurate debunking of widely documented vaccine myths under realistic conditions, but that linguistic complexity and framing-sensitive style may limit accessibility. Careful integration of LLMs into public health channels, alongside transparent sourcing and readability optimization, could enable these models to be used as scalable tools for debunking vaccine myths.

10
TDA Engine v2.1: A Computational Framework for Detecting Structural Voids in Spatially Censored Epidemiological Data with Temporal Classification and Causal Inference

Mboya, G. O.

2026-03-05 health informatics 10.64898/2026.02.01.26345283 medRxiv
Top 0.1%
6.7%
Show abstract

BackgroundIn public health surveillance, silence--the absence of data--is often more significant than the signal. Traditional epidemiological mapping tools efficiently visualize data density but struggle to mathematically define data absence. Standard approaches conflate stochastic sparsity with systemic suppression and remain vulnerable to edge effects. MethodsWe introduce a topological framework that detects structural voids--regions of unexpected data absence within clusters. Using Distance-to-Measure (DTM) filtration with adaptive thresholding via the Kneedle algorithm [11], we eliminate arbitrary parameter choices. Version 2.1 extends the original framework with three methodological additions: (1) a temporal void classifier combining the Fano factor and a two-state Hidden Markov Model (HMM) to distinguish persistent structural silence from stochastic fluctuation across reporting periods; (2) a causal taxonomy (BORDER, ACCESS, INFRASTRUCTURE, SYSTEM, UNKNOWN) that maps detected voids to probable reporting failure mechanisms via covariate decision trees; and (3) an Observed-to-Expected (O/E) completeness engine calibrated against WHO-standard disease incidence rates across seven conditions. Parameters are derived geometrically from the DTM distribution itself. We validate against known ground truth through a censoring simulation framework using public Kenyan health facility data. Detection accuracy is quantified using the Jaccard index [12], centroid error, and recovery rate. ResultsTDA Engine achieves Jaccard = 0.82 (95% CI: 0.74-0.89) on simulated suppression events, significantly outperforming KDE (0.45) and relative risk surfaces (0.38). Centroid error is 342 m (IQR: 187-512 m). The temporal classifier correctly labels 91% of structurally silent units across six-period validation datasets (HMM posterior P (structural) [&ge;]0.60). Permutation tests yield p = 0.003 (95% CI: 0.001-0.008) [13], confirming statistical significance beyond complete spatial randomness. ConclusionTDA Engine v2.1 provides a mathematically rigorous, topology-based framework for detecting structural voids in censored epidemiological data and classifying them by temporal persistence and probable causal mechanism. By shifting from density-based to geometry-based inference with quantitative validation metrics and causal labelling, we enable public health officials to distinguish between natural gaps and potential suppression, and to direct field investigation resources accordingly. We emphasize that structural voids are geometric anomalies consistent with suppression, not proof thereof--requiring contextual validation.

11
Impact of prescription-free access to sexually transmitted infection screening tests in medical-biological laboratories: cross-sectional analysis of data from clinical laboratories in France.

Gil-Salcedo, A.; Gazzano, V.; Arsene, S.; Durand, A.; Roger, S.; Prots, L.; Laurencin, N.; Chanard, E.; Duez, A.; Le Naour, E.; Bausset, O.; Ghali, B.; Strzelecki, A.-C.; Felloni, C.; Levillain, R.; Fargeat, C.; Lefrancois, S.; Feuerstein, D.; Visseaux, B.; Escudie, L.; Visseaux, C.; Leclerc, C.; Haim-Boukobza, S.

2026-04-24 public and global health 10.64898/2026.04.23.26351562 medRxiv
Top 0.1%
6.7%
Show abstract

Background: Since September 2024, France has implemented a national reform allowing prescription-free access (PFA) to sexually transmitted infection (STI) screening in medical biological laboratories (MBLs). This study aims to characterize the populations undergoing STI testing according to their access modality and evaluate the probability of test positivity in relation to testing pathway, sex, and age groups. Methods: We conducted a cross-sectional analysis of all individuals screened for Chlamydia trachomatis, Gonorrhoea, human immunodeficiency virus (HIV), hepatitis B virus (HBV), and syphilis by treponemal-specific immunoassay (TSI) in Cerballiance MBLs between Mars 2025 and February 2026. Multivariable logistic regression models stratified by sex and adjusted for age and region assessed associations between screening modality and STI positivity. Results: Among 1,008,737 individuals included, 27.8% were under PFA and 72.2 under prescription-based access (PBA). PFA users were more frequently male (47.4% vs. 36.3%, p<0.001) and aged 20-39 years (34.0%, p<0.001). Overall positivity rates differed by modality: PFA was associated with higher detection of Chlamydia (4.6% vs. 3.6%). PBA group showed more positive cases of syphilis (3.4% vs. 1.2%), HBV (1.3% vs. 0.4%), and HIV infections (0.3% vs. 0.2%, all p<0.001). Co-infection and gonorrhoea proportions did not significantly differ between modalities. Conclusions: PFA substantially increased STI screening uptake, particularly among young adults and men, and enhanced detection of bacterial STIs. PBA remains essential for diagnosing viral and chronic infections. These findings highlight the complementary roles of both access strategies and support PFA screening as an effective public health intervention to broaden STI detection and reduce transmission.

12
Comparative Evaluation of Logistic Regression and Gradient Boosting Models for Influenza Outbreak Early-Warning Using U.S. CDC ILINet Surveillance Data (2010-2025)

Onwuameze, C. N.; Madu, V.

2026-03-13 health informatics 10.64898/2026.03.05.26347655 medRxiv
Top 0.1%
6.5%
Show abstract

BackgroundTimely detection of seasonal influenza outbreaks is critical for healthcare system preparedness and public health response. Although numerous studies have examined short-term influenza forecasting, fewer have operationalized prediction as a binary early-warning problem linked to actionable surveillance thresholds. This study evaluated the performance of traditional and machine learning models for detecting national influenza outbreak weeks using U.S. Centers for Disease Control and Prevention (CDC) ILINet surveillance data. MethodsWeekly national ILINet data from 2010-2025 were analyzed. Outbreak weeks were defined as those in which weighted influenza-like illness (ILIPERCENT) exceeded the 90th percentile of the 2010-2017 training distribution (threshold = 3.3932%). Predictors included three-week lags of ILIPERCENT and percent positive laboratory specimens, along with seasonal harmonic terms. Models were trained on 2010-2017 data and evaluated on a temporally held-out 2020-2025 test period. Performance metrics included area under the receiver operating characteristic curve (AUC), precision-recall area under the curve (PR-AUC), sensitivity, specificity, precision, and F1-score. FindingsOn the 2020-2025 test set, logistic regression achieved an AUC of 0.9964 and PR-AUC of 0.9868, with sensitivity of 1.0000 and specificity of 0.9516. XGBoost achieved an AUC of 0.9946 and PR-AUC of 0.9812, with sensitivity of 0.8939 and specificity of 0.9798. Both models demonstrated near-perfect discrimination between outbreak and non-outbreak weeks under strict temporal validation. InterpretationNational influenza outbreak early-warning can be implemented using publicly available CDC surveillance data with high discriminatory accuracy. Framing prediction as a threshold-based outbreak detection problem strengthens operational relevance and supports integration of predictive analytics into routine influenza surveillance and preparedness planning. Author SummarySeasonal influenza places a heavy burden on hospitals and communities each year, yet public health officials often rely on surveillance reports that describe what has already happened rather than signaling when activity is about to intensify. We examined whether routinely collected U.S. influenza surveillance data could be used to detect outbreak conditions earlier and more clearly. Using national data from the Centers for Disease Control and Prevention (CDC) covering 2010 to 2025, we compared a traditional statistical model with a machine learning approach to determine how accurately each could identify weeks when influenza activity exceeded a predefined outbreak threshold. Both approaches performed extremely well when tested on recent seasons, correctly distinguishing outbreak from non-outbreak weeks with high accuracy. Importantly, this framework translates weekly surveillance data into a practical alert signal rather than simply producing numerical forecasts. By linking model output to a clear outbreak definition, health departments and healthcare systems could use similar tools to support timely planning, communication, and resource allocation during influenza season.

13
CGM accuracy and reliability compared to point of care testing in older inpatients with comorbid type 2 diabetes and cognitive impairment

Donat-Ergin, B.; Mattishent, K.; Minihane, A. M.; Holt, R.; Murphy, H.; Dhatariya, K.; Hornberger, M.

2026-03-30 endocrinology 10.64898/2026.03.27.26349485 medRxiv
Top 0.2%
6.4%
Show abstract

Background: Older in-patients have a higher prevalence of diabetes and cognitive impairment. Cognitive impairment can make blood glucose management more challenging, since patients might not remember to measure blood glucose or report symptoms. Investigating the accuracy of continuous glucose monitoring (CGM) compared to usual care will inform clinical interpretations in this vulnerable population. Aim: To compare CGM derived glucose metrics and point-of-care tests (POCT) in older in-patients with T2DM and cognitive impairment and to investigate CGM accuracy compared to POCT in the hospital settings with the same population. Methods: Thirty-two older people with comorbid T2DM and cognitive impairment were recruited within a tertiary care hospital in the UK. All participants were naive to CGM and were asked to wear blinded Dexcom G7 sensors for up to 10 days. All participants received usual care in their hospital stay including the use of POCT. Key accuracy metrics comprised the mean absolute relative difference (MARD), median absolute relative difference (median ARD), and Clarke Error Grid (CEG), correlation (R2) analysis. In addition, the percentage of CGM readings falling within +/-20% of reference glucose values when the reference was >5.6 mmol/L, or within +/-1.1 mmol/L when the reference was <=5.6 mmol/L (+/-20%/1.1 mmol/L) was calculated to assess analytical and clinical accuracy. Results: Thirty participants completed the study. CGM derived mean glucose for time in range (TIR= 4-10 mmol/mol) was 36.23% (min= 0%, max= 90%), time above range (TAR >= 10 mmol/mol) was 62.87% and time below range (TBR <= 3.9 mmol/mol) was 1.03%. Mean TIR based on available POCT readings was 40.84%, TAR was 57.24% and TBR 1.81%, showing similar readings as CGM derived glucose metrics. Comparison of the two resulted in a MARD of 17.4%, and median ARD of 12.2% and the outcome of +/-20%/1.1 mmol/L analysis was 72.3%. CEG analysis revealed that 99.3% of the data points fell within the clinically acceptable zones (Zone A and Zone B), and there was a strong correlation (R2=0.82) between CGM and POCT. CGM captured more hypoglycaemic readings in our participants. Conclusion: Our study suggests that CGM and POCT derived glucose metrics are largely similar for in-patients with diabetes and cognitive impairment. CGM remains as a safe and clinically acceptable tool, and able to capture more nocturnal hypoglycaemia compared to POCT in a subgroup of patients. These initial findings show that CGM might be a viable alternative for people with comorbid T2DM and cognitive impairment.

14
A Cross-Sectional Study of COVID-19 Vaccine Hesitancy and Behaviours among People Living with HIV in British Columbia

Ejiegbu, A. E.; Shariati, B.; Little, J.; Brondani, M.

2026-02-03 public and global health 10.64898/2026.01.31.26345295 medRxiv
Top 0.2%
6.4%
Show abstract

ObjectiveAlthough COVID-19 vaccination is important for People Living with HIV given their elevated infection and comorbidity risks, some PLHIV are hesitant to accept vaccination. Hence, we conducted a cross-sectional study in British Columbia, Canada, aimed to identify socio-economic and health-related factors predicting COVID-19 vaccine uptake and contributing to hesitancy among PLHIV. MethodsA 34-item anonymous self-administered survey was disseminated to PLHIV accessing services through HIV and AIDS-related organisations e-newsletters between November 2022 and January 2023 in British Columbia. The survey included sociodemographic information, COVID-19 factors, HIV indicators, and the Vaccine Hesitancy Scale. Descriptive and inferential statistics were conducted to detect significant associations between the sociodemographic characteristics, health-related factors and COVID-19 vaccine uptake using IBM(R) SPSS(R) 28 and significance level at p<0.05. ResultsFrom the 276 respondents (mean age 29.93{+/-}7.55), 54.7% were men, 31.6% identified as sexual minorities, and 46.7% were of indigenous origin. Approximately 40% of the respondents received at least three vaccine doses, while 82.2% received at least one dose. Vaccine hesitancy was associated with lower education, age <44, and low income. Predictors of COVID-19 vaccine uptake included age [OR=1.06, 95% CI=1.01-1.12], bachelors degree [OR=0.22, 95% CI=0.07-0.72], family/friends infected with COVID-19 [OR 3.68 95% CI=1.56 - 8.67], HIV viral load >500 copies [OR=0.20, 95% CI=0.06-0.61], belief in vaccine importance [OR= 0.51, CI=0.28-0.95], trust in Health Canadas information [OR 0.49 CI=0.29-0.83], and concerns about vaccine adverse effects [OR=0.35, CI=0.22-0.56]. Concerns about vaccine adverse effects reduced the likelihood of receiving three COVID-19 vaccine doses by 65%. ConclusionsConsiderations must be taken around specific factors that may have an impact on COVID-19 vaccination rates among PLHIV, including information about vaccine adverse effects, HIV viral load, age, and education level. This insight should guide the development of policies and interventions aimed at encouraging individuals to maintain an up-to-date vaccination status.

15
High-Resolution District Level Contraceptive Prevalence in Pakistan Using a Bayesian Small Area Estimation Approach

Ibrahim, M.; Naz, O.; Javeed, A.; Irum, A.; Khan, A.; Khan, A. A.

2026-02-28 public and global health 10.64898/2026.02.25.26347119 medRxiv
Top 0.2%
6.4%
Show abstract

IntroductionNational surveys in Pakistan are typically representative only at national or provincial levels, leaving large uncertainties in district-level contraceptive prevalence. This obscures local heterogeneity and limits data-driven program planning. Administrative data, although more frequent and detailed, are often underused due to reporting and measurement challenges. This study develops a multi-source small area estimation (SAE) framework to generate district-level estimates of contraceptive prevalence rate (CPR) and modern contraceptive prevalence rate (mCPR) using routine commodities data. MethodsA two-stage Bayesian SAE model was constructed to integrate survey, supply, and census data. In Stage 1, contraceptive dispensation data from the Contraceptive Logistics Management Information System (cLMIS) were converted into inferred users, normalized to married women of reproductive age (MWRA) from the 2023 Census, and scaled to provincial CPR benchmarks from the Pakistan Social and Living Standards Measurement Survey (PSLM). In Stage 2, a bivariate hierarchical Bayesian model jointly estimated CPR and mCPR, accounting for measurement error and borrowing statistical strength from socioeconomic and demographic covariates. Convergence and model stability were assessed through standard diagnostics (R-hat, ESS, BFMI, divergence checks). ResultsDistrict-level estimates were produced for 121 districts. CPR ranged from 9% to 46% and mCPR from 6% to 35%. Aggregated provincial estimates were consistent with PSLM benchmarks (within {+/-} 0.6 percentage points). Comparison with published district studies showed mean absolute deviations around 4 percentage points. ConclusionThe Bayesian SAE framework generates statistically coherent, high-resolution contraceptive prevalence estimates, substantially improving visibility into geographic inequities in Pakistans family planning landscape. These granular metrics offer policymakers an actionable basis for prioritizing underserved districts and tailoring context-sensitive interventions.

16
Assessing the Impact of Timing and Coverage of United States COVID-19 Vaccination Campaigns: A Multi-Model Approach

Nande, A.; Larsen, S. L.; Turtle, J.; Davis, J. T.; Bandekar, S. R.; Lewis, B.; Chen, S.; Contamin, L.; Jung, S.-m.; Howerton, E.; Shea, K.; Bay, C.; Ben-Nun, M.; Bi, K.; Bouchnita, A.; Chen, J.; Chinazzi, M.; Fox, S. J.; Hill, A. L.; Hochheiser, H.; Lemaitre, J. C.; Loo, S. L.; Marathe, M.; Meyers, L. A.; Pearson, C. A. B.; Porebski, P.; Przykucki, E.; Smith, C. P.; Venkatramanan, S.; Vespignani, A.; Willard, T. C.; Yan, K.; Viboud, C.; Lessler, J.; Truelove, S.

2026-04-08 public and global health 10.64898/2026.04.07.26349269 medRxiv
Top 0.2%
6.3%
Show abstract

Background Six years after its emergence, SARS-CoV-2 continues to have a substantial burden. The impact of vaccination and the optimal timing of its rollout remain uncertain given existing population immunity and variability in outbreak timing between summer and winter. Methods The US Scenario Modeling Hub convened its 19th round of ensemble projections for COVID-19 hospitalizations and deaths in the United States, where eight teams projected trajectories in each US state and nationally from April 2025 to April 2026 under five scenarios regarding vaccine recommendations and timing. Recommendations had two eligibility scenarios (high-risk individuals only and all-eligible) and two timing scenarios (classic start: mid-August, earlier start: late June). These were crossed to create four scenarios and were compared against a counterfactual scenario with no vaccination. Findings Compared to no vaccination, our ensemble projections estimated 90,000 (95% PI 53,000-126,000) hospitalizations averted in the high-risk and classic timing scenario across the US. Expanding to all-eligible age-groups averted an additional 26,000 (95% PI 14,000-39,000) hospitalizations, which when coupled with the early vaccination timing, was projected to further reduce national hospitalizations by 15,000 (95% PI -3,000-33,000). The majority of teams projected both summer and winter waves. Implications We project COVID-19 will cause significant hospitalizations and deaths in the US in the 2025-26 season and estimate significant benefits from a broad all-eligible vaccination recommendation. The results also suggest an additional benefit is likely to be gained from an earlier vaccination campaign. Funding Centers for Disease Control and Prevention; National Institute of Health (US), National Science Foundation (US)

17
Travel Time as a Predictor of Missed Appointments and Telemedicine Utilization in a Rural Outpatient Clinic: A Retrospective Cross-Sectional Observational Study

Graves, P.; Jacobsen, C.; Ho, A.; Johnson, D.; Weaver, D.

2026-03-25 primary care research 10.64898/2026.03.20.26348551 medRxiv
Top 0.2%
6.3%
Show abstract

Background Rural populations face disproportionate barriers to healthcare access, often due to geographic isolation and limited provider availability. Prior studies have shown that increased travel time negatively affects appointment adherence. Telemedicine has emerged as a potential solution, but understanding its utilization in rural populations remains ongoing. Methods This retrospective cross-sectional observational study analyzed all scheduled appointments (n=5,548) from a single rural family medicine clinic in the Pacific Northwest United States during 2024. One-way travel times were calculated using the Google Maps Distance Matrix API and categorized into Short (<15 minutes), Medium (15-30 minutes), and Long (>30 minutes) commute groups. Proportions for utilization and cancellations of both telemedicine and in-person appointments were assessed across commute groups using chi-square tests (p < 0.05 considered significant). Results Overall, the proportion of cancellations were significantly higher among patients with Long commutes (36.2%) compared to Medium (31.0%) and Short (32.2%) commute groups (p < 0.001). Telemedicine utilization increased proportionately with commute time (7.7% for Long commute patients vs. 1.5% for Short; p < 0.001). However, telemedicine cancellation proportions did not significantly differ across groups (21.2% for Long, 13.3% for Medium, 17.0% for Short; p = 0.122), suggesting comparable telemedicine adherence regardless of distance. The proportions for in-person appointment utilization and cancellation were both greatest for the Short commute group. Conclusion Longer travel times are associated with increased appointment cancellations for rural patients, reinforcing travel burden as a key barrier to care. Telemedicine use increases with commute distance and demonstrates consistent adherence across groups, indicating its value as a tool to address rural healthcare gaps. These findings support the continued expansion of telehealth infrastructure to improve care for geographically isolated populations.

18
A Web Application for Exploring Distribution in Academic Publications Across Geography and Institutions in India

Hou, Y.; Cohen, E.; Higginbottom, J.; Rountree, L.; Ren, Y.; Wahl, B.; Nyhan, K.; Mukherjee, B.

2026-03-20 health informatics 10.64898/2026.03.18.26348755 medRxiv
Top 0.2%
6.2%
Show abstract

India's national research capacity and infrastructure are unevenly distributed across states and union territories (UTs), contributing to geographic variation in academic publication output. We developed Indiapub, an open-access web application that quantitatively enumerates and visually displays geographic and temporal publication patterns for research products with at least one author affiliated with an Indian institution, using OpenAlex data. The app is designed for ease of use, with automated data retrieval, cleaning, and aggregation. Indiapub allows users to filter publications by topic, publication year range, author position, publication type, minimum citation count, state/UT, and population size of the state/UT where the author institution is located. The app also provides downloadable tables and ranked institution lists by publication count. Its interactive dashboard includes five modules: (i) a map of publication distribution, (ii) time trend plots for nation and state/UT, (iii) publication-share versus population-share plots highlighting over- and underrepresentation, (iv) stacked bar charts of state/UT contributions over time with population benchmarks, and (v) bubble plots relating the Human Development Index to publication volume over time. This tool may support resource prioritization and identification of institutional strengths for trainees, researchers, higher education administrators, and policymakers. To illustrate its utility, we present sample findings derived from the app. For publications across all topics from 2014 to 2025, the largest research participation footprints were observed in Tamil Nadu, Maharashtra, Delhi, Uttar Pradesh, and Karnataka. Tamil Nadu and Delhi were home to three of the highest-publishing institutions nationally: Vellore Institute of Technology, All India Institute of Medical Sciences, and Indian Institute of Technology Delhi. We also examined six curated case studies of broad scientific interest: electronic health records (EHR), genome-wide association studies (GWAS), artificial intelligence (AI), development economics, environmental science, and COVID-19. Findings from these case studies revealed over- and underrepresentation in publication output across states and UTs. For example, in EHR publications among high-population states, Tamil Nadu's publication share exceeded its population share by 31.3 percentage points (pp), whereas Bihar's was 12.8 pp lower. Our tool offers insights into India's research landscape across states and UTs with easy-to-digest visuals. Such interactive tools have the potential to serve as a starting point for fostering a more inclusive research ecosystem supporting targeted research policy and planning.

19
Assessing AI tool use in among New York State clinicians

Galfano, A.; Barbosu, C. M.; Aladin, B.; Rivera, I.; Dye, T. D. V.

2026-01-30 primary care research 10.64898/2026.01.29.26345129 medRxiv
Top 0.2%
6.1%
Show abstract

Artificial intelligence (AI) is dramatically changing the healthcare landscape by providing patients, clinicians, administrators, and public health professionals with tools aiming to improve efficiency, outcomes, and experience in health. As elsewhere, New York State (NYS) experiences high demand for - and high investment in - transformation in healthcare with AI tools, though little is known about clinicians use and interest in adopting AI tools in their work. A large share of the nations future primary care clinicians train and work in NYS, and the states ability to establish clear policies, provide tools, and elevate AI competency have implications for care delivery nationally. As a result, we undertook this analysis of NYS clinicians use of AI to better understand opportunities for its adoption and inclusion in continuing education. For this analysis, we included healthcare providers who deliver ambulatory or specialty medical care within NYS, with use/frequency/purpose of AI tools by clinicians in their work as the main outcome. Of 305 NYS clinical providers responding, 23.4% indicated they use AI tools for work, and 11.1% report monthly use, 8.5% weekly use, and 4.6% daily use. AI was primarily used to search guidelines and ask clinical questions, followed by identifying drug interactions, analyzing data, analyzing images/labs, and creating care plans and patient recommendations. AI use did not vary significantly across professional disciplines or practice types, though independent practitioners were significantly more likely than advanced practice providers to use AI in their work, as were providers using social media and digital methods for obtaining continuing education. AI use increased substantially in 2025 compared with 2024. Overall, our findings suggest that programs targeting clinicians could incorporate these findings in designing accessible and acceptable AI-related continuing education opportunities to help familiarize clinicians with opportunities and risks for integrating AI tools into their practices. Author SummaryAI tools are rapidly gaining traction in the delivery of healthcare. We found that clinician use of AI was quite limited (23%), though growing. Those using AI tools used them sparingly in their work, with only about 5% reporting daily use. The purposes for which clinicians report using AI - asking clinical questions, interpreting patient results, creating patient educational materials - could contribute substantially to healthcare outcomes if widely adopted. Designers of continuing education for clinicians should help provide opportunities for clinicians to improve their familiarity, use, and competency with AI tools, to help maximize the potential health benefits possible for patients and communities.

20
Association of sexual orientation outness and recent homophobic violence with not being on antiretroviral treatment: Analysis of a Latin American Survey in men who have sex with men living with HIV

ENCISO DURAND, J. C.; Silva-Santisteban, A. A.; Reyes-Diaz, M.; Huicho, L.; Caceres, C. F.; LAMIS-2018,

2026-04-23 public and global health 10.64898/2026.04.22.26351515 medRxiv
Top 0.2%
5.1%
Show abstract

Objectives: In Latin America, up-to-date information to monitor UNAIDS 95-95-95 HIV targets in key populations, such as men who have sex with men, is limited. Elsewhere, structural homophobia restricts access to ART. Conceptual frameworks suggest that intersecting forms of violence and discrimination may negatively influence HIV care outcomes through psychosocial and structural pathways, although empirical evidence remains limited. The study aimed to assess whether sexual orientation outness and recent homophobic violence are associated with not being on ART among Latin American MSM living with HIV. Methods: This cross-sectional study is a secondary analysis of data from LAMIS-2018, including 7,609 MSM aged 18+ with an HIV diagnosis [&ge;]1 year prior from 18 Latin American countries. Participants self-reported ART status, sociodemographic characteristics, homophobic violence, and sexual orientation outness. Bivariate and multivariate logistic regressions identified those factors associated with not being on ART. Results: Nine percent of MSM with HIV were not on ART, 18% reported low sexual orientation outness, and 27% experienced homophobic violence, especially in Andean and Central American countries. Not being on ART was associated with recent homophobic violence (aPR=1.25), low outness (aPR=1.22), unemployment (aPR=1.27), and residence in the Andean subregion (aPR=1.87), Mexico (aPR=1.28), or the Southern Cone (aPR=1.45) versus Brazil. Protective factors included being older (25-39: aPR=0.72; >39: aPR=0.49), living in large cities (aPR=0.72), having a stable partner (aPR=0.78), and university education (aPR=0.74). Conclusions: Recent homophobic violence and low sexual orientation outness were associated with not being on ART among MSM in Latin America. While access varies across countries, structural factors such as stigma and violence may limit engagement in care. Addressing these barriers alongside strengthening health systems may be key to improving ART uptake and advancing progress toward the 95-95-95 targets.